Exploring Adaptor Grammars for Native Language Identification
نویسندگان
چکیده
The task of inferring the native language of an author based on texts written in a second language has generally been tackled as a classification problem, typically using as features a mix of n-grams over characters and part of speech tags (for small and fixed n) and unigram function words. To capture arbitrarily long n-grams that syntax-based approaches have suggested are useful, adaptor grammars have some promise. In this work we investigate their extension to identifying n-gram collocations of arbitrary length over a mix of PoS tags and words, using both maxent and induced syntactic language model approaches to classification. After presenting a new, simple baseline, we show that learned collocations used as features in a maxent model perform better still, but that the story is more mixed for the syntactic language model.
منابع مشابه
Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models
This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with “adaptors” that can induce dependencies among successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichle...
متن کاملNative Language Detection with Tree Substitution Grammars
We investigate the potential of Tree Substitution Grammars as a source of features for native language detection, the task of inferring an author’s native language from text in a different language. We compare two state of the art methods for Tree Substitution Grammar induction and show that features from both methods outperform previous state of the art results at native language detection. Fu...
متن کاملVariational Inference for Adaptor Grammars
Adaptor grammars extend probabilistic context-free grammars to define prior distributions over trees with “rich get richer” dynamics. Inference for adaptor grammars seeks to find parse trees for raw text. This paper describes a variational inference algorithm for adaptor grammars, providing an alternative to Markov chain Monte Carlo methods. To derive this method, we develop a stick-breaking re...
متن کاملExploring Male and Female Iranian EFL Learners’ Attitude towards Native and Non-native Varieties of English
This study investigated whether Iranian EFL learners are aware of different varieties of English spoken throughout the world and whether they have tendency towards a particular variety of English. Likewise, it explored the attitudes of Iranian EFL learners towards the native and non-native varieties of English. Moreover, it made an attempt to investigate whether such attitudes are gender-orient...
متن کاملExtending the Use of Adaptor Grammars for Unsupervised Morphological Segmentation of Unseen Languages
We investigate using Adaptor Grammars for unsupervised morphological segmentation. Using six development languages, we investigate in detail different grammars, the use of morphological knowledge from outside sources, and the use of a cascaded architecture. Using cross-validation on our development languages, we propose a system which is language-independent. We show that it outperforms two sta...
متن کامل